|
Knowledge discovery and data mining are concerned with the discovery
of valid, novel, potentially useful, and understandable patterns in
data. Most data mining algorithms require that the data are
represented as a single, attribute-value table. In contrast, data
mining techniques that are developed within the framework of
Inductive Logic Programming (ILP) are applicable directly to
multi-relational databases. The application of ILP for data mining
is also termed relational data mining.
This thesis develops and investigates pruning techniques that are
applicable for pattern discovery within a restricted ILP setting
where all patterns describe subgroups of a fixed population of
individuals. This variant of pattern discovery is also known as
relational subgroup discovery. The main contribution of the thesis
is an Apriori-like search algorithm for relational subgroup
discovery. Formerly, Apriori-like search was applied only in
attribute-value (i. e., non-relational) settings.
In particular, the contributions of the thesis are (1) a formal
description of subgroup discovery in the framework of ILP, (2)
optimum estimates for the interestingness criteria distributional
unusualness and implication intensity, (3) an extension to the
well-known Apriori algorithm that allows to constrain the set of
patterns searched by Apriori, (4) an ILP language bias that allows
the application of an Apriori-like algorithm for relational subgroup
discovery, (5) an SQL-based language bias for relational subgroup
discovery via SQL queries to a relational database management
system, (6) a novel approach for integrating pruning based on
structured attributes in an Apriori-like search algorithm, (7) an
approach for integrating pruning based on discreetized numerical
attributes in an Apriori-like search algorithm, (8) an experimental
evaluation of the various pruning methods (namely, optimum
estimates, Apriori-like pruning, use of structure in attributes for
pruning), (9) the application of the approach for data mining in a
real-world financial database, and extensive comparisons with
related work.
The experiments provide a comparison of different methods for
pruning the search space for subgroup discovery in an ILP framework.
Optimum estimates and Apriori-like pruning have produced good and
reliable pruning effects, while the effect of pruning based on
structured attributes varied for different search settings. In
particular, the experiments have shown that Apriori-like pruning can
have a similarly good effect for search in a multi-relational data
base as it has for search in a single-relation database.
The application of the approach for data mining in a real-world
financial database has shown that the language bias is well suited
for the task of relational subgroup discovery, and that its
expressivity is practically useful.
A detailed english abstract is given in Appendix B of the thesis.
|